Info-fuzzy algorithms for mining dynamic data streams
نویسندگان
چکیده
Most data mining algorithms assume static behavior of the incoming data. In the real world, the situation is different and most continuously collected data streams are generated by dynamic processes, which may change over time, in some cases even drastically. The change in the underlying concept, also known as concept drift, causes the data mining model generated from past examples to become less accurate and relevant for classifying the current data. Most online learning algorithms deal with concept drift by generating a new model every time a concept drift is detected. On one hand, this solution ensures accurate and relevant models at all times, thus implying an increase in the classification accuracy. On the other hand, this approach suffers from a major drawback, which is the high computational cost of generating new models. The problem is getting worse when a concept drift is detected more frequently and, hence, a compromise in terms of computational effort and accuracy is needed. This work describes a series of incremental algorithms that are shown empirically to produce more accurate classification models than the batch algorithms in the presence of a concept drift while being computationally cheaper than existing incremental methods. The proposed incremental algorithms are based on an advanced decision-tree learning methodology called "info-fuzzy network" (IFN), which is capable to induce compact and accurate classification models. The algorithms are evaluated on real-world streams of traffic and intrusion detection data.
منابع مشابه
Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملSingle-Pass Algorithms for Mining Frequency Change Patterns with Limited Space in Evolving Append-Only and Dynamic Transaction Data Streams
In this paper, we propose an online single-pass algorithm MFC-append (Mining Frequency Change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called ChangeSketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modifie...
متن کاملOnline Mining Changes of Items over Continuous Append-only and Dynamic Data Streams
Online mining changes over data streams has been recognized to be an important task in data mining. Mining changes over data streams is both compelling and challenging. In this paper, we propose a new, single-pass algorithm, called MFC-append (Mining Frequency Changes of append-only data streams), for discovering the frequent frequency-changed items, vibrated frequency changed items, and stable...
متن کاملAn Overview of Algorithms Used for Mining Frequent Patterns in Data Streams
Data streams are an ordered sequence of items that arrives in timely order. It is impossible to store the data in which item arrives. To apply data mining algorithm directly to streams instead of storing them before in a database. Real time surveillances system, telecommunication system, sensor network, financial applications, transactional data are some of the examples of the data stream syste...
متن کاملA Review on Algorithms for Mining Frequent Itemset Over Data Stream
Frequent itemset mining over dynamic data is an important problem in the context of data mining. The two main factors of data stream mining algorithm are memory usage and runtime, since they are limited resources. Mining frequent pattern in data streams, like traditional database and many other types of databases, has been studied popularly in data mining research. Many applications like stock ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Appl. Soft Comput.
دوره 8 شماره
صفحات -
تاریخ انتشار 2008